-
Notifications
You must be signed in to change notification settings - Fork 764
Prewarm LLM cache #6692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prewarm LLM cache #6692
Conversation
|
||
@cache | ||
|
||
@lru_cache(maxsize=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache
is basically lru_cache(maxsize=None)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ this! I added a commit to limit when the LLM cache is pre-warmed.
@@ -1337,8 +1337,6 @@ def filter_exceptions(event, hint): | |||
|
|||
USER_INACTIVITY_DAYS = config("USER_INACTIVITY_DAYS", default=1095, cast=int) | |||
|
|||
if DEV: | |||
GOOGLE_APPLICATION_CREDENTIALS = config("GOOGLE_APPLICATION_CREDENTIALS", default="") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GOOGLE_APPLICATION_CREDENTIALS
is only needed for local development, and doesn't have to be added to settings even for that case. It's never used within the dev
, stage
, or prod
environments.
I love the idea of this, but it re-introduces the slow start-up issue that caused our deployment problems. I'm going to revert this. I don't think we need this anyway, because we're not really concerned with the initial startup cost when making the first LLM call. |
This reverts commit c055b04.
No description provided.